Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

104

Binary Neural Architecture Search

where λ is a hyperparameter to balance the two terms. H^l

c ^{is the}^c^{th full-precision ﬁlter of}

the lth convolutional layer and ^ˆH^l

c ^{denotes its corresponding reconstructed ﬁlter; MSE(}^·⁾

represents the mean square error (MSE) loss. The second term minimizes the intraclass

compactness since the binarization process causes feature variations. fC,s( ^ˆH) denotes the

feature map of the last convolutional layer for the sth sample, and f C,s( ^ˆH) denotes the

class-speciﬁc mean feature map for the corresponding samples. Combining L ˆ

H ^{with the}

conventional loss LCE, we obtain the ﬁnal loss:

L = LCE + L ˆ

H^.

(4.18)

The L and its derivatives are easily calculated directly using the eﬃcient automatic

derivatives package.

4.3.5

Ablation Study

We tested diﬀerent βP for our method on the CIFAR-10 dataset, as shown on the right side

of Fig. 4.9. We can see that when βP increases, the precision increases at ﬁrst but decreases

when βP ≥2. It validates that the performance loss between the Child and Parent models

is a signiﬁcant measure for the 1-bit CNNs search. When βP increases, CP-NAS tends to

select the architecture with fewer convolutional operations, and the imbalance between two

elements in our CP model leads to a performance drop.

We also compare the architectures obtained by CP-NAS, Random, PC (PC-DARTs),

and BNAS† as shown in Fig. 4.9. Unlike the case of the full-precision model, Random

and PC-DARTs lack the necessary guidance, which has poor performance for binarized

architecture search. Both BNAS† and CP-NAS have the evaluation indicator for operation

selection. Diﬀerently, our CP-NAS also uses performance loss, which can outperform the

other three strategies.

Eﬃciency. As shown in XNOR, the 1-bit CNNs are very eﬃcient and promising for

resource-limited devices. Our CP-NAS achieves a performance comparable to that of the

full precision hand-crafted model with up to an estimated 11 times memory saving and 58

times speed up, which is worth further research and will beneﬁt extensive edge computing

applications.

βP

Accuracy (%)

Random

BNAS†

CP-NAS

Search strategy

Accuracy (%)

FIGURE 4.9

The result (right) for diﬀerent βP on CIFAR-10. The 1-bit CNNs result (left) for diﬀerent

search strategies on CIFAR-10, including random search, PC (PC-DARTs), BNAS†, CP-

NAS. We approximately implement BNAS† by setting βP as 0 in CP-NAS, which means

that we only use the performance measure for the operation selection.